Dataset README / Codebook


---------------
## 1. Overview
---------------
This README and codebook have been design to help you understand the structure, contents, and likely usage of the dataset `DDoS23072022.csv`. The dataset appears to be related to DDoS/DoS network traffic classification (attack vs benign) based on the file name, but the exact provenance and original documentation are not included here — please consult original data owners/authors for authoritative descriptions.
-----------------------
## 2. Quick summary
-----------------------
- Rows: 845373
- Columns: 40
- Sample rows (first 5 shown below):
```
Unnamed: 0,Flow ID, Source IP, Source Port, Destination IP, Destination Port, Protocol, Timestamp, Flow Duration, Total Fwd Packets,Total Length of Fwd Packets, Fwd Packet Length Max, Fwd Packet Length Min, Fwd Packet Length Mean,Flow Bytes/s, Flow Packets/s, Flow IAT Mean, Flow IAT Max, Flow IAT Min,Fwd IAT Total, Fwd IAT Mean, Fwd IAT Max, Fwd IAT Min, Fwd Header Length,Fwd Packets/s, Min Packet Length, Max Packet Length, Packet Length Mean, Average Packet Size, Avg Fwd Segment Size, Fwd Header Length.1,Subflow Fwd Packets, Subflow Fwd Bytes,Init_Win_bytes_forward, Init_Win_bytes_backward, act_data_pkt_fwd, min_seg_size_forward, Inbound, Label, label
98115,192.16.0.5-192.168.50.4-615-28754-17,192.16.0.5,615,192.168.50.4,28754,17,29:52.1,3,2,2944,1472,1472,1472.0,981000000.0,666666.6667,3.0,3,3,3,3.0,3,3,-2,666666.6667,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
137,192.16.0.5-192.168.50.4-0-0-0,192.16.0.5,0,192.168.50.4,0,0,29:52.1,117876168,25274,0,0,0,0.0,0.0,214.8356231,4654.905343,6258062,0,117876168,4664.114589,6258062,0,0,214.4114491,0,0,0.0,0.0,0.0,0,25274,0,-1,-1,0,0,1,LDAP,1
98988,192.16.0.5-192.168.50.4-900-42364-17,192.16.0.5,900,192.168.50.4,42364,17,29:52.1,2,2,2944,1472,1472,1472.0,1470000000.0,1000000.0,2.0,2,2,2,2.0,2,2,-2,1000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
35177,192.16.0.5-192.168.50.4-616-10537-17,192.16.0.5,616,192.168.50.4,10537,17,29:52.1,3,2,2944,1472,1472,1472.0,981000000.0,666666.6667,3.0,3,3,3,3.0,3,3,-2,666666.6667,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
55362,192.16.0.5-192.168.50.4-617-14928-17,192.16.0.5,617,192.168.50.4,14928,17,29:52.1,44,2,2944,1472,1472,1472.0,66900000.0,45454.54545,44.0,44,44,44,44.0,44,44,-2,45454.54545,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
89854,192.16.0.5-192.168.50.4-618-44206-17,192.16.0.5,618,192.168.50.4,44206,17,29:52.1,1,2,2944,1472,1472,1472.0,2940000000.0,2000000.0,1.0,1,1,1,1.0,1,1,-2,2000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
77321,192.16.0.5-192.168.50.4-900-39339-17,192.16.0.5,900,192.168.50.4,39339,17,29:52.1,1,2,2944,1472,1472,1472.0,2940000000.0,2000000.0,1.0,1,1,1,1.0,1,1,-2,2000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
95343,192.16.0.5-192.168.50.4-619-41557-17,192.16.0.5,619,192.168.50.4,41557,17,29:52.1,2,2,2944,1472,1472,1472.0,1470000000.0,1000000.0,2.0,2,2,2,2.0,2,2,-2,1000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
17137,192.16.0.5-192.168.50.4-620-46920-17,192.16.0.5,620,192.168.50.4,46920,17,29:52.1,1,2,2944,1472,1472,1472.0,2940000000.0,2000000.0,1.0,1,1,1,1.0,1,1,-2,2000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1
51354,192.16.0.5-192.168.50.4-621-38016-17,192.16.0.5,621,192.168.50.4,38016,17,29:52.1,1,2,2944,1472,1472,1472.0,2940000000.0,2000000.0,1.0,1,1,1,1.0,1,1,-2,2000000.0,1472,1472,1472.0,2208.0,1472.0,-2,2,2944,-1,-1,1,-1,1,LDAP,1

```
-------------------------------------
## 3. Column codebook (summary)
-------------------------------------
A CSV codebook was generated as `DDoS23072022_codebook.csv`. Key columns and types (inferred):

| column                      | dtype   |   unique |   missing | inferred_role                        |
|:----------------------------|:--------|---------:|----------:|:-------------------------------------|
| Unnamed: 0                  | int64   |   506766 |         0 | Numeric feature                      |
| Flow ID                     | object  |   787636 |         0 | Categorical / string feature         |
| Source IP                   | object  |      970 |         0 | IP address / network identifier      |
| Source Port                 | int64   |    60957 |         0 | Numeric feature                      |
| Destination IP              | object  |     1034 |         0 | IP address / network identifier      |
| Destination Port            | int64   |    65533 |         0 | Numeric feature                      |
| Protocol                    | int64   |        3 |         0 | Numeric feature                      |
| Timestamp                   | object  |    12165 |         0 | Temporal / timestamp                 |
| Flow Duration               | int64   |   121046 |         0 | Numeric feature                      |
| Total Fwd Packets           | int64   |      256 |         0 | Numeric feature                      |
| Total Length of Fwd Packets | int64   |     3166 |         0 | Numeric feature                      |
| Fwd Packet Length Max       | int64   |     1538 |         0 | Numeric feature                      |
| Fwd Packet Length Min       | int64   |      539 |         0 | Numeric feature                      |
| Fwd Packet Length Mean      | float64 |     4394 |         0 | Numeric feature                      |
| Flow Bytes/s                | float64 |   126095 |        23 | Numeric feature                      |
| Flow Packets/s              | float64 |   122413 |         0 | Numeric feature                      |
| Flow IAT Mean               | float64 |   122378 |         0 | Numeric feature                      |
| Flow IAT Max                | int64   |   120017 |         0 | Numeric feature                      |
| Flow IAT Min                | int64   |     1866 |         0 | Numeric feature                      |
| Fwd IAT Total               | int64   |   112379 |         0 | Numeric feature                      |
| Fwd IAT Mean                | float64 |   112893 |         0 | Numeric feature                      |
| Fwd IAT Max                 | int64   |   111034 |         0 | Numeric feature                      |
| Fwd IAT Min                 | int64   |      497 |         0 | Numeric feature                      |
| Fwd Header Length           | int64   |      688 |         0 | Numeric feature                      |
| Fwd Packets/s               | float64 |   122179 |         0 | Numeric feature                      |
| Min Packet Length           | int64   |      519 |         0 | Numeric feature                      |
| Max Packet Length           | int64   |     1558 |         0 | Numeric feature                      |
| Packet Length Mean          | float64 |     5970 |         0 | Numeric feature                      |
| Average Packet Size         | float64 |     6062 |         0 | Numeric feature                      |
| Avg Fwd Segment Size        | float64 |     4394 |         0 | Numeric feature                      |
| Fwd Header Length.1         | int64   |      688 |         0 | Numeric feature                      |
| Subflow Fwd Packets         | int64   |      256 |         0 | Numeric feature                      |
| Subflow Fwd Bytes           | int64   |     3166 |         0 | Numeric feature                      |
| Init_Win_bytes_forward      | int64   |     1382 |         0 | Numeric feature                      |
| Init_Win_bytes_backward     | int64   |     1088 |         0 | Numeric feature                      |
| act_data_pkt_fwd            | int64   |      213 |         0 | Numeric feature                      |
| min_seg_size_forward        | int64   |       49 |         0 | Numeric feature                      |
| Inbound                     | int64   |        2 |         0 | Numeric feature                      |
| Label                       | object  |        6 |         0 | Possible target/label (check values) |
| label                       | int64   |        2 |         0 | Numeric feature                      |


----------------------------------------------------
## 4. Numeric column summary (basic statistics)
----------------------------------------------------
Numeric columns (count, mean, std, min, 25%, 50%, 75%, max) — see the file `DDoS23072022_codebook.csv` for per-column metadata and numeric columns stats where applicable.

```
,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,845373.0,204389.65631502308,152480.4422030975,0.0,74597.0,173854.0,321573.0,665701.0
 Source Port,845373.0,29895.88823276826,23169.507131938208,0.0,5573.0,28874.0,52487.0,65534.0
 Destination Port,845373.0,31775.073154690297,19604.351097450348,0.0,14650.0,31851.0,48859.0,65535.0
 Protocol,845373.0,9.290229283405077,5.045906901464116,0.0,6.0,6.0,17.0,17.0
 Flow Duration,845373.0,5415786.444794191,16274021.83375521,0.0,1.0,1.0,103.0,119992888.0
 Total Fwd Packets,845373.0,3.177313446253902,37.34370113270998,1.0,2.0,2.0,2.0,25274.0
Total Length of Fwd Packets,845373.0,253.6943526703597,955.7751419681766,0.0,12.0,12.0,458.0,208524.0
 Fwd Packet Length Max,845373.0,112.13749788554874,234.3852939358063,0.0,6.0,6.0,229.0,3625.0
 Fwd Packet Length Min,845373.0,105.17409356579877,217.57384023624235,0.0,6.0,6.0,229.0,2033.0
 Fwd Packet Length Mean,845373.0,106.39387920699001,217.69672714485304,0.0,6.0,6.0,229.0,2033.0
Flow Bytes/s,845350.0,159886164.71541286,383056791.9654424,0.0,206896.5517,6000000.0,12000000.0,2940000000.0
 Flow Packets/s,845373.0,926038.1776170686,963216.2372353144,0.0,34482.75862,75471.69811,2000000.0,3000000.0
 Flow IAT Mean,845373.0,508497.837306033,1518320.3866839588,0.0,1.0,1.0,46.0,51200016.0
 Flow IAT Max,845373.0,2342509.4606227074,6948917.253031849,0.0,1.0,1.0,99.0,117569202.0
 Flow IAT Min,845373.0,2365.1499693034907,234632.96672191934,0.0,1.0,1.0,1.0,51200016.0
Fwd IAT Total,845373.0,5401801.649629217,16270179.623945674,0.0,1.0,1.0,46.0,119992888.0
 Fwd IAT Mean,845373.0,684368.7687602841,1945999.3979448455,0.0,1.0,1.0,46.0,51200016.0
 Fwd IAT Max,845373.0,2335318.6837833715,6947143.59163022,0.0,1.0,1.0,46.0,117616338.0
 Fwd IAT Min,845373.0,2302.5833200255984,234732.68095023825,0.0,1.0,1.0,1.0,51200016.0
 Fwd Header Length,845373.0,-27134445.22731031,239103145.67189577,-4250875888.0,40.0,40.0,40.0,429016.0
Fwd Packets/s,845373.0,920825.2843989158,967499.2188396202,0.0,17391.30435,42553.19149,2000000.0,3000000.0
 Min Packet Length,845373.0,105.13085702997375,217.4938572653785,0.0,6.0,6.0,229.0,1472.0
 Max Packet Length,845373.0,120.69242689321754,276.86328935397347,0.0,6.0,6.0,229.0,3627.0
 Packet Length Mean,845373.0,107.90920996038794,218.84033616697133,0.0,6.0,6.0,229.0,1721.142857
 Average Packet Size,845373.0,159.8935016130956,327.0103105408959,0.0,7.5,9.0,343.5,2208.0
 Avg Fwd Segment Size,845373.0,106.39387920699001,217.69672714485304,0.0,6.0,6.0,229.0,2033.0
 Fwd Header Length.1,845373.0,-27134445.22731031,239103145.67189577,-4250875888.0,40.0,40.0,40.0,429016.0
Subflow Fwd Packets,845373.0,3.177313446253902,37.34370113270998,1.0,2.0,2.0,2.0,25274.0
 Subflow Fwd Bytes,845373.0,253.6943526703597,955.7751419681766,0.0,12.0,12.0,458.0,208524.0
Init_Win_bytes_forward,845373.0,4168.455003885859,3611.2537612831784,-1.0,-1.0,5840.0,5840.0,65535.0
 Init_Win_bytes_backward,845373.0,120.04534211525564,2109.582449004073,-1.0,-1.0,-1.0,0.0,65535.0
 act_data_pkt_fwd,845373.0,1.9831506329158846,21.58765722231814,0.0,1.0,1.0,1.0,18766.0
 min_seg_size_forward,845373.0,-13562834.92199183,119287778.71499078,-1062718972.0,20.0,20.0,20.0,1480.0
 Inbound,845373.0,0.9630920315647649,0.18853597096763333,0.0,1.0,1.0,1.0,1.0
 label,845373.0,0.9630920315647649,0.18853597096763333,0.0,1.0,1.0,1.0,1.0

```
---------------------------------------------------------
## 5. Suggested next steps / preprocessing checklist
---------------------------------------------------------
1. **Verify the target/label:** The automatic inference suggests possible target columns where the name or low unique values indicate a label. Check `DDoS23072022_codebook.csv` sample values to confirm the true label column and its encoding (e.g., 'BENIGN' / 'DDoS' or 0/1).  
2. **Timestamps:** Convert timestamp columns (if any) to datetime dtype and verify timezone.  
3. **IP addresses / identifiers:** Consider anonymizing or excluding raw IP columns before sharing.  
4. **Imbalanced classes:** If the label is imbalanced, consider stratified sampling, resampling, or appropriate metrics (F1, AUC).  
5. **Feature scaling:** Scale numeric features as needed (StandardScaler/MinMax) before applying distance-based or gradient-based models.  
6. **Categorical features:** One-hot encoding / ordinal encoding as appropriate.  
7. **Train/validation split:** Use time-based split for time-series data or stratified split for iid data.  


